Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine concurrency logic #5063

Merged
merged 8 commits into from
Feb 16, 2024

Conversation

adamfarley
Copy link
Contributor

This change ensures that we identify the correct memory size when running tests in containers, by setting CGMEM to be the number of memory bytes in all circumstances.

We could remove the "-lt" logic altogether, but we shouldn't because sometimes the cgroup limits are set far beyond the machine's maximum memory, for reasons we do not yet understand.

We need to make sure we're not using so many cores that the tests
are starved of memory.

Signed-off-by: Adam Farley <[email protected]>
Instead of what it does right now, which is to compare bytes with
kilobytes, and incorrectly assume that the kilobytes number is always
smaller.

Signed-off-by: Adam Farley <[email protected]>
@adamfarley
Copy link
Contributor Author

Linked to #5012

May not close it. Further review required. Should improve matters, though.

@adamfarley
Copy link
Contributor Author

@adamfarley
Copy link
Contributor Author

Yep, that worked. Now sets concurrency to a more sensible level. Ready to merge.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/sys/fs/cgroup/memory/memory.limit_in_bytes is inherently cgv1 (legacy) specific. This won't work on cgroup v2 systems. FYI.

@sxa
Copy link
Member

sxa commented Feb 14, 2024

Ref Severin's comment above, /sys/fs/cgroup/memory.max appears to be the right thing for V2 so we ought to change that (either in this PR or a separate one)
Noting that this is a follow on to other fixes in this area under #4933 and #5035

Copy link
Contributor

@andrew-m-leonard andrew-m-leonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@sxa sxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamfarley Because I'm paranoid about these things can you fire this off on a subset of diverse machines similar to the table at the top of #4933 to check that this won't regress anything please?

Suggest one of the PLCT RISC-V machines (4 core, 16Gb), test-sxa-armv7l-ubuntu2004-odroid-2 (8 core, 2Gb), something running on an Ubuntu 2004 host and something on Ubuntu 22.04 as a minimum.

@adamfarley
Copy link
Contributor Author

Ref Severin's comment above, /sys/fs/cgroup/memory.max appears to be the right thing for V2 so we ought to change that (either in this PR or a separate one) Noting that this is a follow on to other fixes in this area under #4933 and #5035

Okie dokie. Will fix.

@adamfarley
Copy link
Contributor Author

@adamfarley Because I'm paranoid about these things can you fire this off on a subset of diverse machines similar to the table at the top of #4933 to check that this won't regress anything please?

Suggest one of the PLCT RISC-V machines (4 core, 16Gb), test-sxa-armv7l-ubuntu2004-odroid-2 (8 core, 2Gb), something running on an Ubuntu 2004 host and something on Ubuntu 22.04 as a minimum.

Gotcha. Will run with the updated fix.

@adamfarley
Copy link
Contributor Author

Ok, given that we (a) want to tolerate both memory.max AND memory.etcetcbytes, and (b) that memory.max can be both present, readable and empty, here's a proposal for the new script:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`; 
elif [[ -r /sys/fs/cgroup/memory/memory.limit_in_bytes && $(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`; 
else 
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

I added formatting for an easier review.

What do you think?

@adamfarley
Copy link
Contributor Author

P.S. If I ever complain I have too much spare time, remind me of this makefile. Turning it into a bash script would remove the need to put all of this script into a single line.

In cgroup v1 we used memory.limit_in_bytes to store the maximum
memory allocated to the container. In v2, we use memory.max. This
change allows us to check for both, includes a meminfo check for
non-containers, and adds a few debug comments so we can be sure this
new code is working.

My plan is to run tests on a diverse set of machines, and to remove
the debug statements prior to merging.

Signed-off-by: Adam Farley <[email protected]>
@adamfarley
Copy link
Contributor Author

Test run: https://ci.adoptium.net/job/Grinder/8832/

Will check across other machines once I know the new code works.

@jerboaa
Copy link
Contributor

jerboaa commented Feb 14, 2024

Ok, given that we (a) want to tolerate both memory.max AND memory.etcetcbytes, and (b) that memory.max can be both present, readable and empty, here's a proposal for the new script:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`; 
elif [[ -r /sys/fs/cgroup/memory/memory.limit_in_bytes && $(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`; 
else 
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

I added formatting for an easier review.

What do you think?

You should test on the FS type on the cgroup root. In pseudo code:

if [ "$(stat -f --format '%T' /sys/fs/cgroup)_" == "cgroup2fs_" ]; then
   #cgroup v2 branch
else
   #cgroup v1 branch
fi

@adamfarley
Copy link
Contributor Author

adamfarley commented Feb 14, 2024

Ok, had a ponder and came up with this:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -f "/.dockerenv" ]]; then
    CGRPV=`stat -f --format '%T' /sys/fs/cgroup`;
    if [[ "_$$CGRPV" == "_cgroup2fs" ]]; then
        CGMEM=`cat /sys/fs/cgroup/memory.max 2>1`; 
    elif [[ "_$$CGRPV" == "_tmpfs" ]]; then 
        CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>1`; 
    fi; 
fi;

if [[ ! $$(CGMEM) =~ ^[0-9]+$$ ]]; then
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

This way we:

  • Check if we're in a container before running the stat command, which not all OS' have.
  • Can handle things if the file is missing, unreadable, or empty.
  • Don't use cgroups contents if we're not in a container (some OS' have that folder even outside of any containers).

Added some debug code (so we know we're following the right code path) and ran to test it here: https://ci.adoptium.net/job/Grinder/8836/

To ensure we're looking in the correct file for the maximum memory
size. Also to handle permissions issues, empty files, etc.

Signed-off-by: Adam Farley <[email protected]>
Signed-off-by: Adam Farley <[email protected]>
@jerboaa
Copy link
Contributor

jerboaa commented Feb 14, 2024

if [[ -f "/.dockerenv" ]]; then

Means you assume docker as the runtime. Fails once moved to podman or something else, FWIW. Not sure if that's an issue worth addressing at this point.

@sxa
Copy link
Member

sxa commented Feb 14, 2024

You should test on the FS type on the cgroup root. In pseudo code:

Do you think that's going to trap additional situations compared to just checking for the presence of memory.max? Given where this code is (on one line for now) my preference would probably be to just check one or the other given that this is already quite complex for one line unless we need both checks.

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`; 

We should avoid mixing $(...) and the backquote syntax in this. I also don't personally like using bash regex is bash but I guess that's just personal preference and that ship has sailed elsewhere in our code.

if [[ -f "/.dockerenv" ]]; then

Means you assume docker as the runtime. Fails once moved to podman or something else

Is that check even necessary?

@adamfarley
Copy link
Contributor Author

adamfarley commented Feb 15, 2024

Taking into consideration Stewart and Severin's points, I'm testing this:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max ]]; then
    CGMEM=`cat /sys/fs/cgroup/memory.max 2>1`; 
else
    CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>1`; 
fi; 

if [[ ! $$(CGMEM) =~ ^[0-9]+$$ ]]; then
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

This removes the cgroup version check, but also removes the need for it (as we now have error-handling).

One fewer man-page check (for stat) and one fewer "if" layer, to make debugging easier.

Test run: https://ci.adoptium.net/job/Grinder/8847/

Makes it easier to maintain, and more resilient to failure.

I've also added a commented, formatted copy of the script to help
people read it in the future.

Signed-off-by: Adam Farley <[email protected]>
@adamfarley
Copy link
Contributor Author

Also, I added the formatted copy of the script into a big comment, to make maintenance easier in the future.

@adamfarley
Copy link
Contributor Author

adamfarley commented Feb 16, 2024

Ok, typo fixed and code works. All debug comments removed, and tested here:

https://ci.adoptium.net/job/Grinder/8850/console

Requesting merge now.

@andrew-m-leonard andrew-m-leonard merged commit 787c3a5 into adoptium:master Feb 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants